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Abstract 

Motivated by the need for robust and fast distributed computation in highly dynamic Pecr-to- 
Peer (P2P) networks, we study algorithms for the fundamental distributed agreement problem. 
P2P networks arc highly dynamic networks that experience heavy node churn (i.e., nodes join 
and leave the network continuously over time). Our goal is to design fast algorithms (running in 
a small number of rounds) that guarantee, despite high node churn rate, that almost all nodes 
reach a stable agreement. Our main contributions are randomized distributed algorithms that 
guarantee stable almost- everywhere agreement with high probability even under high adversarial 
churn in polylogarithmic number of rounds. In particular, we present the following results: 

1. An 0(log 2 n)-round (n is the stable network size) randomized algorithm that achieves 
almost-everywhere agreement with high probability under up to linear churn per round 
(i.e., en, for some small constant e > 0), assuming that the churn is controlled by an 
oblivious adversary (has complete knowledge and control of what nodes join and leave and 
at what time and has unlimited computational power, but is oblivious to the random choices 
made by the algorithm). 

2. An 0(log?nlog 3 n)-round randomized algorithm that achieves almost-everywhere agree- 
ment with high probability under up to e^/n churn per round (for some small e > 0), where 
m is the size of the input value domain, that works even under an adaptive adversary (that 
also knows the past random choices made by the algorithm). 

Our algorithms are the first-known, fully-distributed, agreement algorithms that work under 
highly dynamic settings (i.e., high churn rates per step). Furthermore, they are localized (i.e., do 
not require any global topological knowledge), simple, and easy to implement. These algorithms 
can serve as building blocks for implementing other non-trivial distributed computing tasks in 
dynamic P2P networks. 

Keywords: Peer-to-Peer network, Dynamic network, Stable agreement, Distributed algorithm, 
Randomized algorithm, Expander graphs. 
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1 Introduction 



1.1 Motivation 

Peer-to-peer (P2P) computing is emerging as one of the key networking technologies in recent years 
with many application systems, e.g., Skype, BitTorrent, Cloudmark etc. However, many of these 
systems are not truly P2P, as they are not fully decentralized — they typically use hybrid P2P along 
with centralized intervention. For example, Cloudmark pQ is a large spam detection system used 
by millions of people that operates by maintaining a hybrid P2P network; it uses central authority 
to regulate and charge users for participation in the network. A key reason for the lack of fully- 
distributed P2P systems is the difficulty in designing highly robust algorithms for large-scale dynamic 
P2P networks. Indeed, P2P networks are highly dynamic networks characterized by high degree 
of node churn — i.e., nodes continuously join and leave the network. Connections (edges) may be 
added or deleted at any time and thus the topology changes very dynamically. In fact, measurement 
studies of real- world P2P networks [17} 1321 [33] show that the churn rate is quite high: nearly 50% of 
peers in real-world networks can be replaced within an hour. (However, despite a large churn rate, 
these studies also show that the total number of peers in the network is relatively stable.) We note 
that peer-to-peer algorithms have been proposed for a wide variety of computationally challenging 
tasks such as collaborative filtering [8j, spam detection [I], data mining [11 j . and worm detection 
and suppression [35} I27|. However, unfortunately, all algorithms proposed for these problems have 
no theoretical guarantees of being able to work in a dynamic network with a large churn rate. This 
is a major bottleneck in implementation and wide-spread use of these algorithms. 

In this paper, we take a step towards designing robust algorithms for large-scale dynamic peer- 
to-peer networks. In particular, we study the fundamental distributed agreement problem in P2P 
networks (the formal problem statement and model is given in Section [2]). An efficient solution 
to the agreement problem can be used as a building block for robust and efficient solutions to 
other problems as mentioned above. However, the distributed agreement problem in P2P networks 
is challenging since the goal is to guarantee almost- everywhere agreement, i.e., almost all nodes3 
should reach consensus, even under high churn rate. The churn rate can be as much as linear per 
time step (round), i.e., up to a constant fraction of the stable network size can be replaced per time 
step. Indeed, till recently, almost all the work known in the literature (see e.g., j!4[ 134} l2Tj 119} !2Uj) 
have addressed the almost-everywhere agreement problem only in static (bounded-degree) networks 
and these approaches do not work for dynamic networks with changing topology. For example, the 
work of Upfal [33] showed how one can achieve almost-everywhere agreement under up to linear 
number — up to en, for a sufficiently small e > — of byzantine faults in a bounded-degree 
expander network (n is the network size). The algorithm required O(logn) rounds and polynomial 
(in n) number of messages; however, the local computation required by each processor is exponential. 
Furthermore, the algorithm requires knowledge of the global topology, since at the start, nodes need 
to have this information "hardcoded". Such approaches fail in dynamic networks where both nodes 
and edges can change by a large amount in every round. The work of King et al. [22J is important 
in the context of P2P networks, as it was the first to study scalable (polylogarithmic communication 
and number of rounds) algorithms for distributed agreement (and leader election) that was tolerant 
to byzantine faults. However, as pointed out by the authors, their algorithm works only for static 
networks; similar to Upfal's algorithm, the nodes require hardcoded information on the network 
topology to begin with and thus does not work when the topology changes. In fact, this work 
Q22J) raises the open question whether one can design agreement protocols that can work in highly 
dynamic networks with a large churn rate. 

In sparse, bounded-degree networks, an adversary can always isolate some number of non-faulty nodes, hence 
almost-everywhere is the best one can hope for in such networks [14] , 



1 



1.2 Our Main Results 



Our first contribution is a rigorous theoretical framework for design and analysis of algorithms for 
highly dynamic distributed systems with churn. We briefly describe the key ingredients of our model 
here. (Our model is described in detail in Section [5J) Essentially, we model a P2P network as a 
bounded-degree expander graph whose topology — both nodes and edges — can change arbitrarily 
from round to round and is controlled by an adversary. However, we assume that the total number 
of nodes in the network is stable. The number of node changes per round is called the churn rate or 
churn limit. We consider churn rate up to some en, where n is the stable network size. Note that 
our model is quite general in the sense that we only assume that the topology is an expander at 
every step; no other special properties are assumed. Indeed, expanders have been used extensively 
to model dynamic P2P networks in which the expander property is preserved under insertions and 
deletions (e.g., [251 130] ). Since we don't make assumptions on how the topology is preserved, our 
model is applicable to all such expander-based networks. 

We study stable, almost-every where, agreement in our model. By "almost-every where", we mean 
that almost all nodes, except possibly /3c(n) nodes (where c(n) is the order of the churn and /3 > 
is some small constant) should reach agreement on a common value. (This agreed value must be the 
input value of some node.) By "stable" we mean that the agreed value is preserved subsequently 
after the agreement is reached. 

Our main contribution is design and analysis of randomized distributed algorithms that guar- 
antee stable almost-everywhere agreement with high probability (i.e., with probability 1 — l/n^ 1 )) 
even under high adversarial churn in polylogarithmic number of rounds. Our algorithms also guar- 
antee stability with high probability. In particular, we present the following results (the precise 
theorem statements are given in the respective sections below): 

1. (cf. Section |3J) An 0(log 2 n)-round (n is the stable network size) randomized algorithm that 
achieves almost-everywhere agreement with high probability under up to linear churn per 
round (i.e., en, for some small constant e > 0), assuming that the churn is controlled by an 
oblivious adversary (that has complete knowledge of what nodes join and leave and at what 
time, but is oblivious to the random choices made by the algorithm). Our algorithm requires 
only polylogarithmic in n bits to be processed and sent (per round) by each node. 

2. (cf. Section[5]) An 0(log m log 3 n)-round randomized algorithm that achieves almost-everywhere 
agreement with high probability under up to Ey/n churn per round, for some small e > 0, that 
works even under an adaptive adversary (that also knows the past random choices made by 
the algorithm). Note that m refers to the size of the domain of input values. Our algorithm 
requires up to polynomial in n bits (and up to O(logm) bits) to be processed and sent (per 
round) by each node. 

3. (cf. Section EJ) We also show that no deterministic algorithm can guarantee almost-everywhere 
agreement (regardless of the number of rounds), even under constant churn rate. 

To the best of our knowledge, our algorithms are the first-known, fully-distributed, agreement 
algorithms that work under highly dynamic settings. Our algorithms are localized (do not require 
any global topological knowledge), simple, and easy to implement. These algorithms can serve as 
building blocks for implementing other non-trivial distributed computing tasks in P2P networks. 

1.3 Technical Contributions 

The main technical challenge that we have to overcome is designing and analyzing distributed 
algorithms in networks where both nodes and edges can change by a large amount. Indeed, when 
the churn rate is linear, i.e., say en per round, in constant (1/e) number of rounds the entire network 
can be renewed! 
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We derive techniques for information spreading (cf. Section [2D that help in doing non-trivial 
distributed computation in such networks. The first technique that we use is "flooding". We show 
that in an expander-based P2P network even under linear churn rate, it is possible to spread infor- 
mation by flooding if sufficiently many (a /3 fraction of the order of the churn) nodes initiate the 
information spreading. In other words, even an adaptive adversary cannot "suppress" more than a 
small fraction of the values. The precise statements and proofs are in Section [3l 

To analyze these flooding techniques we introduce the dynamic distance, which describes the 
effective distance between two nodes with respect to the causal influence. We define the notions 
of influence sets and dynamic distance (or flooding time) in dynamic networks with node churn. 
(Similar notions have been defined for dynamic graphs with a fixed set of nodes, e.g., |23|. 16]).) In 
(connected) networks where the nodes are fixed, the effective diameter (e.g., [23]) is always finite. 
In the highly dynamic setting considered here, however, the effective distance between two nodes 
might be infinite, thus we need a more refined definition for influence set and dynamic distance. 

The second technique that we use is "support estimation" (cf. Section f3.4j) . Support estimation 
is a randomized technique that allows us to estimate the aggregate count (or sum) of values of all 
or a subset of nodes in the network. Support estimation is done in conjunction with flooding and 
uses properties of the exponential distribution (similar to [IDJ EH] ) . Support estimation allows us 
to estimate the aggregate value quite precisely with high probability even under linear churn. But 
this works only for an oblivious adversary; to get similar results for the adaptive case, we need to 
increase the amount of bits that can be processed and sent by a node in every round. 

Apart from support estimation, we also use our flooding techniques in the agreement algorithm 
for the oblivious case (cf. Algorithm [TJ) to sway the decision one way or the other. For the adaptive 
case (cf. Algorithm [2D , we use the variance property of a certain probability distribution to achieve 
the same effect with constant probability. 

1.4 Other Related Work 

The distributed agreement (or consensus) problem is important in a wide range of applications, 
such as database management, fault-tolerant analysis of aggregate data, and coordinated control of 
multiple agents or peers. There is a long line of research on various versions of the problem with 
many important results (see e.g., |26[ [3] and the references therein). The relaxation of achieving 
agreement "almost everywhere" was introduced by [H] in the context of fault-tolerance in networks 
of bounded degree where all but 0(t) nodes achieve agreement despite t = 0(^-^) faults. This 
result was improved by [21], which showed how to guarantee almost everywhere agreement in the 
presence of a linear fraction of faulty nodes. We also refer to the related results of Berman and 
Garay on the butterfly network [7]. 

There has been significant work in designing peer-to-peer networks that are provably robust to 
a large number of Byzantine faults [H Q2J [221 Q2J EI] • These focus only on robustly enabling storage 
and retrieval of data items. The problem of achieving almost-every where agreement among nodes 
in P2P networks is considered by King et al. in [22] in the context of the leader election problem; 
essentially, [22] is a sparse network implementation of the full information protocol of |21j . More 
specifically, [22] assumes that the adversary corrupts a constant fraction b < 1/3 of the processes 
that are under its control throughout the run of the algorithm. The protocol of [22] guarantees 
that with constant probability an uncorrupted leader will be elected and that a 1 — 0(j^-^) fraction 
of the uncorrupted processes know this leader. Note that the failure assumption of [22] is quite 
different from the one we use: Even though we do not assume corrupted nodes, the adversary is 
free to subject different nodes to churn in every round. Also note that the algorithm of [22] does 
not work for dynamic networks. 

In the context of agreement problems in dynamic networks, various versions of coordinated 
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consensus (where all nodes must agree) have been considered by Kuhn et al in [23]. The model of 
[24j assumes that the nodes are fixed whereas the topology of the network can change arbitrarily 
as long as connectivity is maintained. In this sense, the framework we introduce in Section [2J is 
more general than the model of [23], as it is additionally applicable to dynamic settings with node 
churn. The same is true for the notions of dynamic distance and influence set that we introduce 
in Section [3-H which is more general than the corresponding definitions of [23], since in our model 
the dynamic distance is not necessarily finite. In fact, according to [23], modeling churn is one of 
the important open problems in the context of dynamic networks. Our paper takes a step in this 
direction. 

In most work on fault-tolerant agreement problems the adversary a priori commits to a fixed set 
of faulty nodes. In contrast, [13] considers an adversary that can corrupt the state of some (possibly 
changing) set of 0{^/n) nodes in every round. The median rule of [T3] provides an elegant way 
to ensure that most nodes stabilize on a common output value within O(logn) rounds, assuming 
a complete communication graph. The median rule, however, only guarantees that this agreement 
lasts for some polynomial number of rounds, whereas we are able to retain agreement ad infinitum. 

Expander graphs and spectral properties have already been applied extensively to improve the 
network design and fault-tolerance in distributed computing (cf. [331 H31 E] ) . Law and Siu [25] provide 
a distributed algorithm for maintaining an expander in the presence of churn with high probability 
by using Hamiltonian cycles. Information spreading in distributed networks is the focus of [9] where 
it is shown that this problem requires O(logn) rounds in graphs with a certain conductance in the 
push/pull model where a node can communicate with a randomly chosen neighbor in every round. 

Aspnes et al. [2] consider information spreading via expander graphs against an adversary, which 
is related to the flooding techniques we derive in Section [3J More specifically, in [2J there are two 
opposing parties "the alert" and "the worm" (controlled by the adversary) that both try to gain 
control of the network. In every round each alerted node can alert a constant number of its neighbors, 
whereas each of the worm nodes can infect a constant number of non-alerted nodes in the network. 
In [2J, Aspnes et al. show that there is a simple strategy to prevent all but a small fraction of nodes 
to become infected and, in case that the network has poor expansion, the worm will infect almost 
all nodes. 

The work of [5] shows that, given a network that is initially an expander and assuming some linear 
fraction of faults, the remaining network will still contain a large component with good expansion. 
These results are not directly applicable to dynamic networks with large amount of churn like the 
ones we are considering, as the topology might be changing from round and linear churn per round 
essentially corresponds to 0{n log n) total turn after 0(logn) rounds — the minimum amount of time 
necessary to solve any non-trivial task in our model. 

2 Model and Problem Statement 

We are interested in establishing stable agreement in a dynamic peer-to-peer network in which the 
nodes and the edges change over time. We model dynamism in the network as a family of undirected 
graphs (G r ) r ^o- Each round r ^ 1 starts with network topology G r . Then, the adversary gets to 
change the network from G r ~ l to G r (in accordance to rules outlined below). As is typical, an edge 
(u, v) E E r indicates that u and v can communicate in round r by passing messages. For the sake 
of readability, we use V^ r ' T+t ^ as a shorthand for f]l^V i . Each node u has a unique identifier and 
is churned in at some round and churned out at some r Q !> r^. More precisely, for each node w, 
there is a maximal range [rj,r — 1] such that u G y\- r ^ r °~ l \ and for every r ^ [ri,r Q — 1], u ^ V r . 
Any information about the network at large is only learned through the messages that u receives. 
It has no knowledge about who its neighbors will be in the future. Neither does u know when (or 
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whether) it will be churned out. Note that we do not assume that nodes have access to perfect 
clocks, but we show (cf. Section I3.3[) how the nodes can maintain a global clock. We make the 
following assumptions about the kind of changes that our dynamic network can encounter: 

Stable Network Size: For all r, \V r \ = n, where n is a suitably large positive integer. This 
assumption simplifies our analysis. Our algorithms will work correctly as long as the number 
of nodes is reasonably stable, say, between n — ktl and n + ktl for some suitably small value of 
k. Also, we assume that n (or a constant factor estimate of n) is common knowledge among 
the nodes in the network. 

Churn: For each r > 1, \V r \ y r_1 | = |y r_1 \ V r \ ^ C = ec(n), where C is the churn limit, which 
is some fixed e > fraction of the order of the churn c(n); the equality in the above equation 
ensures that the network size remains stable. Our work is aimed at high levels of churn up to 
a churn limit C that is linear in n, i.e., c(n) = n. 

Bounded Degree Expanders: The sequence of graphs (G r ) r ^o is an expander family with a 
vertex expansion of at least a. In other words, the adversary must ensure that for every G r 
and every S C V r such that \S\ ^ n/2, the number of nodes in V r \ S with a neighbor in S is 
at least a\S\. 

A run of a distributed algorithm consists of an infinite number of rounds. We assume the 
following events occur (in order) in every round r: 

1. A set of at most C nodes are churned in and another set of C nodes are churned out. The 
edges of G r ~ 1 may be changed as well, but G r has to have a vertex expansion of at least a. 

2. The nodes broadcast messages to their (current) neighbors. 

3. Nodes receive messages broadcast by their neighbors. 

4. Nodes perform computation that can change their state and determine which messages to send 
in round r + 1 . 

Bounds on Parameters 

Recall that the churn limit £ = ec(n), where e > is a constant and c(n) is the churn order. When 
c(n) = n, e is the fraction of the nodes churned out/in and therefore we require e to be less than 1. 
However, when c(n) £ o(n), e can exceed 1. In the remainder of this paper, we consider /3 to be a 
small constant independent of n, such that 

^±^</3. (1) 
a 

Moreover, when c(n) = n, we expect j3 < j^. The churn expansion ratio £ ^ a ^ presents a funda- 
mental lower bound for information propagation in our model (cf. Lemma [1]). Finally, we assume 
that n is suitably large (cf. Equations [5] and [6|) . 

2.1 Stable Agreement 

We now define the Stable Agreement problem. Each node v G V° comes with an input value 
associated with it; subsequent new nodes come with value _L. Let V be the set of all input values 
associated with nodes in V° at the start of round 1. Every node u is equipped with a special 
decision variable decision u (initialized to _L) that can be written at most once. We say that a node 
u decides on val when u assigns val to its decision u . Note that this decision is irrevocable, i.e., 
every node can decide at most once in a run of an algorithm. As long as decision u = _L, we say that 
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u is undecided. Stable Agreement requires that a large fraction of the nodes come to a stable 
agreement on one of the values in V. More precisely, an algorithm solves Stable Agreement in 
R rounds, if it exhibits the following characteristics in every run, for any fixed (3 adhering to (HD . 

Validity: If, in some round r, node u 6 V r decides on a value VAL, then VAL € V. 

Almost Everywhere Agreement: We say that the network has reached strong almost everywhere 
agreement by round R, if at least n — /3c(n) nodes in V R have decided on the same value 
VAL* 6 V and every other node remains undecided, i.e., its decision value is _L. In particular, 
no node ever decides on a value val' £ V in the same run, for val' 7^ val* . 

Stability: Let R be the earliest round where nodes have reached almost everywhere agreement on 
value val*. The agreement is stable if, at every round r ^ R, at least n — f3c{n) nodes in V r 
have decided on VAL*. 

We also consider a weaker variant of the above problem that we call Almost Everywhere Binary 
Consensus (or simply, Binary Consensus) where the input values in V are restricted to {0,1}. 
Note that for Binary Consensus the Validity property is trivially satisfied except in runs where 
all nodes start with the same input value. 

We consider two types of adversaries for our randomized algorithms. An oblivious adversary 
must commit in advance to the entire sequence of graph (G r ) r ^o- in other words, an oblivious 
adversary must commit independently of the random choices made by the algorithm. We also 
consider the more powerful adaptive adversary that can observe the entire state of the network in 
every round r (including all the random choices made until round r — 1), and then chooses the nodes 
to be churned out/in and how to change the topology of G r+l . 

3 Techniques for Information Spreading 

Due to the high amount of churn and the dynamically changing network, we use message flooding 
to disseminate and gather information. We now precisely define flooding. Any node can initiate 
a message for flooding. Messages that need to be flooded have an indicator bit bFlood set to 1. 
Each of these messages also contains a terminating condition. The initiating node sends copies of 
the message to itself and its neighbors. When a node receives a message with bFlood set to 1, it 
continues to send copies of that message to itself and its neighbors in subsequent rounds until the 
terminating condition is satisfied. 

3.1 Dynamic Distance and Influence Set 

We define the notion of dynamic distance of a node v from u starting at round r, denoted by 
DD, r (ii — > v). When the subscript r is omitted, we may assume that r = 1. Suppose node u 
joins the network at round r u , and, from round max(r u ,r) onward, u initiates a message m for 
flooding whose terminating condition is: (has reached v). If u is churned out before r, then 
DD. r (-u — > v) is undefined. Suppose the first of those flooded messages reaches v in round r + Ar. 
Then, DD r (u — >■ v) = Ar. Note that this definition allows DD r (u — > v) to be infinite under two 
scenarios. Firstly, node v may be churned out before any copy of m reaches v. Secondly, at each 
round, v can be shielded by churn nodes that absorb the flooded messages and are then removed 
from the network before they can propagate these messages any further. The influence set of a node 
u after R rounds starting at round r is given by: 

Influence,. (u, R) = {v : (DD r (u -> v) s: R) A (v £ V r+R )}. (2) 
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Note that we require Influence,, (u, R) C V r+R . Intuitively, we want the influence set of u (in this 
dynamic setting) to capture the nodes currently in the network that were influenced by u. Note 
however that the influence set of a node u is meaningful even after u is churned out. Analoguously, 
we define Influence,. (U, R) = U u g[/lNFLUENCE r (u, R), for any set of nodes U C V r . 

If we consider only a single node u, an (adaptive) adversary can easily prevent the influence set 
of this node from ever reaching any significant size by simply shielding u with churn nodes that are 
replaced in every round H 



3.2 Properties of Influence Sets 

We now focus our efforts on characterizing influence sets. This will help us in understanding how we 
can use flooding to spread information in the network. For the most part of this section we assume 
that the network is controlled by an adaptive adversary (cf. Section 12 . X [> . The following lemma 
shows that the number of nodes that we need, to influence almost all the nodes in the network, is 
bounded from below by the churn-expansion ratio (cf. Equation (HJ): 

Lemma 1. Suppose that the adversary is adaptive. Consider any set U C V T ~ l (for any r ^ 1 ) 
such that \U\ ^ (3c(n). Then, after 



log re — log c(n) — log(/3 



e(l+a)' 



log(l + a) 

number of rounds, it holds that | Influence,. (£7, T)\ > n—(3c{n). When considering linear churn, i.e., 
c(re) = n, the bound T becomes a constant independent of n. On the other hand, when considering 
a churn order of \fn, we get T £ O (log re). 

Proof. Our proof assumes that r = 1 for simplicity as the arguments extend quite easily to arbitrary 
values of r. We proceed in two parts: First we show that the nodes in U influence at least re/2 
nodes in some T\ rounds. More precisely, we show that |Influence(C7, T\)\ ^ re/2. We use vertex 
expansion in a straightforward manner to establish this part. Then, in the second part we show 
that nodes in Influence(J7, T\) go on to influence more than re — /3c(re) nodes. We cannot use the 
vertex expansion in a straightforward manner in the second part because the cardinality of the set 
that is expanding in influence is larger than re/2. Rather, we use a slightly more subtle argument 
in which we use vertex expansion going backward in time. The second part requires another T\ 
rounds. Therefore, the two parts together complete the proof when we set T = 2T\. 

To begin the first part, consider U C V° at the start of round 1 with \ U\ ^ /3c(re). In round 1, up 
to ec(re) nodes in U can be churned out. Subsequently, the remaining nodes in U influence some nodes 
outside U as G 1 is an expander with vertex expansion at least a. More precisely, we can say that 
|Influence(C7, 1)| ^ (/5c(re) — ec(re))(l + a). At the start of round 2, the graph changes dynamically 
to G 2 . In particular, up to ec(re) nodes might be churned out and they may all be in Influence(C7, 1) 
in the worst case. However, the influenced set will again expand. Therefore, |Influence(C7, 2)| 
cannot be less than (| Influence (U, 1)| -ec(re))(l + a) ^ /3c(re)(l + a) 2 -ec(re)(l + a) 2 -ec(re)(l + a). 
Of course, there will be more churn at the start of round 3 followed by expansion leading to: 

|Influence(IT 3)| ^ (/3c(re)(l + a) 2 - ec(re)(l + a) 2 - ec(re)(l + a) - ec(re))(l + a) 
= /3c(re)(l + a) 3 - ec(re)(l + a) 3 - ec(re)(l + a) 2 - ec(re)(l + a). 



2 An oblivious adversary can achieve the same effect with constant probability for linear churn. 
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This cycle of churn followed by expansion continues and we get the following bound at the end of 
some round i: 



|lNFLUENCE([/,i)| ^ /3c(n)(l + af -ec{n) ^(1 + a) k 

k=l 

1 - fl + a ) i+1 

= /3c(n 1 + a) 1 + ec(n) ^ ec („) 

a 



After Ti 



logn-logc(n)-log(^-2G±2l)-i 



rounds, we get 



log(l+a) 

|Influence({7,Ti)| ^ rt/2. (3) 



Now we move on to the second part of the proof. Let T = 1T\. Clearly, if |Influence([/, T)\ > 
n — j3c{n), we are done. Therefore, for the sake of a contradiction, assume that |Influence(C7, T)\ 
n— /3c(n). Let S = y :r \lNFLUENCE(f7, T), i.e., S is the set of nodes in V T that were not influenced by 
U at (or before) round T. Clearly, \S\ ^ /3c(n) because we have assumed that |Influence(J7, T)| ^ 
n — (3c{n). We will start at round T and work our way backward. For q ^ T, let S q C y 7 , be the 
set of all vertices in V q that, starting from round q, influenced some vertex in S at or before round 
T. More precisely, 

S g = {s:(se V q ) A (Influence, (s, T-q)nS ^ 0)}. 
Suppose \S Tl \ > n/2. Then 

S Tl n Influence^, Ti) / 0, 

since |lNFLUENCE(£/, T\)\ ^ n/2 by ©. Consider a node s* G S Tl n Influence^, Ti). Clearly, s* 
was influenced by [/ and went on to influence some node in S before (or at) round T. However, by 
definition, no node in S can be influenced by any node in U at or before round T. We have thus 
reached a contradiction. We are left with showing that \S Tl \ > n/2. 

We start with 5 and work our way backwards. We know that \S\ ^ (3c(n) > (3c(n) — ec(n). We 
want to compute the cardinality of S T ~ 1 . We first focus on an intermediate set S', which we define 
as <S" = S U {s' : 3(s, s') £ E T }. Since G T is an expander, \S'\ ^ 151(1 + a). Furthermore, it is also 
clear that each node in S' could influence some node in S. Notice that S' \ is the set of nodes 
in S' that were churned in only at the start of round T. Therefore, 

l^" 1 ! ^ \S'\ -ec{n) 

^ \S\(1 + a) -ec{n) 

> (/3c(n) — ec(n))(l + a) — ec(n) 

= (5c{n)(l + a) — ec(n)(l + a) — ec(n). 

Continuing to work our way backwards in time, we get 

|5 T_2 | > /3c(n)(l + a) 2 - ec(n)(l + a) 2 - ec(n)(l + a) - ec(n), 



Or more generally, 



IS^I > /3c(n)(l + af - ec{n) ^ (1 + a) j 

1 - (1 + a) i+l 



/9c(n)(l + a) 1 + ec(n)- 



a 



n , W1 .,■ ec(n)(l + a) l+l ec(n) 
/3c(n)(l + a) 4 — i + 



a a 



We now want the value of i for which \S T l \ > n/2 + 
of i such that 



ec(n) 



> n/2. In other words, we want a value 



Q 




+ 



ec(n) 



a 



> n/2 + 



ec(n) 



a 



which is obtained when i = T\. Therefore, it is easy to see that if we set T = 2T\, we get \S Tl \ > n/2, 



At first glance, it might appear to be counterintuitive that the order of the bound T decreases 
with increasing churn. When the adversary has the benefit of churn that is linear in n, our bound 
on T is a constant, but when the adversary is limited to a churn order of y/n, we get T £ O(logn). 
This, however, turns out to be fairly natural when we note that the size of the set U of nodes that 
we start out with is in proportion to the churn limit. 

We say that a node u € V' is suppressed for R rounds if | Influence,. (u, R)\ < n — /3c(n); 
otherwise we say it is unsuppressed. The following lemma shows that, given a set with cardinality 
at least /3c(n), some node in that set will be unsuppressed. 

Lemma 2. Consider the adaptive adversary. Let U be any subset of V r , r 1, such that 
\U\ ^ (3c(n). Let T be the bound derived in LemmaU^ There is at least one u* E U such that for 
some R £ O(Tlogn), 

| Influence,. («*, R) \ >n — f3c(n). 
In particular, when the order of the churn is n, T becomes a constant, and we have R = O(logn). 

Before we proceed with our key arguments of the proof, we state a property of bipartite graphs 
that we will use subsequently. 

Property 1. Let H = (A,B,E) be a bipartite graph in which \A\ > 1 and every vertex b £ B 
has at least one neighbor in A. There is a subset A* C A of cardinality at most |~|>l|/2] such that 
|{6 : 3a* G A*such that (a*,b) G E}\ ^ [1^1/2]. 

Proof of Property [Jl Consider each node in A to be a unique color. Color each node in B using the 
color of a neighbor in A chosen arbitrarily. Now partition B into maximal subsets of nodes with like 
colors. Consider the parts of the partition sorted in decreasing order of their cardinalities. We now 
greedily choose the first [|^4|/2] colors in the sorted order of parts of B. We call the chosen colors 
C . Clearly, colors in C cover at least as many nodes in B as those not in C . Suppose the colors in 
C cover fewer than [|JB|/2] nodes in B. Then the remaining colors will cover [|5|/2] , but that is a 
contradiction. Therefore, colors in C cover at least [|i?|/2] nodes in B. The nodes in A that have 
the colors in C are the nodes that comprise A*, thereby completing our proof. □ 

Proof of LemmalM Again, our proof assumes r = 1 because it generalizes to arbitrary values of 
r quite easily. From Lemma [TJ we know that the influence of all nodes in U taken together will 
reach n — f3c(n) nodes in T rounds. This does not suffice because we are interested in showing 
that there is at least one node in V° that (individually) influences n — (3c(n) nodes in V R for some 
R = 0(T log n). 

From LemmaHJ we know that U (collectively) will influence at least n — f3c{n) nodes in T rounds, 

i.e., 



From PropertyHJ we know that there is a set U± C U of cardinality at most [|£7|/2] such that 



thereby completing the proof. 



□ 



|Influence(C/,T)| > n — f3c(n). 



| Influence^ 5 T) | > 



n — f3c(n) 



2 
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Recalling that /3 < ^ < |, we know that |Influence([/i, T)| ^ /3c(n). We can again use LemmaU 
to say that Influence([/i, T) influences more than n — (3c[n) nodes in additional T rounds and, 
by transitivity, U\ influences more than n — (3c(n) nodes after IT rounds. We therefore have 
|Influence([/ 1 , 2T)\ > n — /3c(n). Again, we can choose a set U 2 C U\ (using Property [Q) that 
consists of [~|C/i|/2] nodes in U\ such that |Influence(£/2, 2T)| ^ (3c(n). Subsequently applying 
Lemma [T] extends the influence set of U 2 to more than n — /3c(n) after 3T rounds. 

In every iteration i of the above argument, the size of the set Ui decreases by a constant fraction 
until we are left with a single node u* G U such that |Influence(u*, 0(log n)T)\ > n — (3c{n). □ 

Can /3c(n) (or more nodes) be suppressed for any significant number of (say, O(Tlogn)) rounds? 
This is in immediate contradiction to Lemma [5] because any such suppressed set of nodes must 
contain an unsuppressed node! This leads us to the following corollary. 

Corollary 1. The number of nodes that can be suppressed for O(Tlogn) rounds is less than f3c{n), 
even if the networks is controlled by an adaptive adversary. 

Corollary 2. Consider an oblivious adversary that must commit to the entire sequence of graphs 
in advance. If we choose a node u uniformly at random from V°, with probability at least 1 — ^"^ , 

|Influence(«, f2(Tlogn))| > n — /3c(n). 

Proof. Let S C V° be the set of nodes suppressed for f2(Tlogre) rounds. Under an oblivious 
adversary, the node u chosen unformly at random from V° will not be in S with probability 1 — , 
and hence, will not be suppressed with that same probability. □ 

Lemma 3. Consider a dynamic network under linear churn that is controlled by an adaptive ad- 
versary. In some O(logn) rounds, there is a set of unsuppressed nodes V* C V° of cardinality more 
than (1 — (5)n such that 



P| Influence^, r) 

v£V* 



> (1 -p)n. 



Proof. Let V* C V° be any set of unsuppressed nodes, i.e., in some cologn rounds for some con- 
stant Co, the influence set of each v £ V* has cardinality more than (1 — f3)n. Note, however, 
that we cannot guarantee that, for any two vertices v\ and V2 in V* , |Influence(ui, cq log n) n 
lNFLUENCE(ui, Co logn)| > (1 — (3)n. Assume for simplicity that \V*\ is a power of 2. Consider 
any pair of vertices {^1,^2}, both members of V*. Recalling that (3 < < ^, we can say that 
|lNFLUENCE(f 1, Co log n) n Influence(v2, Co logn)| ^ f3n. Therefore, considering the intersected set 
Influence(ui, co logn) n Influence^, cq logn) of nodes has cardinality at least f3n, we can apply 
Lemma □ leading to | Influence^, c log n + T) n Influence^, c logn + T)\ > (1 — 0)n. We 
can partition V* into a set Si of pairs such that for each pair, the intersection of influence sets 
has cardinality more than (1 — /3)n after Co log n + T rounds. Similarly, we can construct a set £2 of 
quadruples by disjointly pairing the pairs in S\. Using similar argument, we can say that for any 
Q € S 2 ,\ OveQ Influence^, co logn + 2T)\ > (1 — 0)n. Progressing similarly, the set Si og |y*| will 
equal V* and we can conclude that 



P| Influence^, co logn + Tlog |V*| 



> {l-p)n. 

Since |V*| ^ n, cologn + Tlog \V*\ E O(logn), thus completing the proof. □ 
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Lemma 4. Suppose that up to e^/n nodes can be subjected to churn in any round by an adaptive 
adversary. In some r £ 0(log 2 n) rounds, there is a set of unsuppressed nodes V* C V° of cardinality 
at least n — j5\fri such that 



P| Influence^, : 



vev* 



> n — fi\fn. 



Proof. From Corollary [U we know that each of the unsuppressed nodes in V* (which is of cardinality 
more than n — (3\/n) will influence more than n — (3c(n) nodes in 0(log 2 n) time. We can use the 
same argument as in Lemma to show that in O (log re) rounds, all the unsuppressed nodes have a 
common influence set of size at least 0(ra). That common influence set will grow to all but n — f3\/n 
nodes within another 0(log 2 n) rounds. Thus a total of 0(log 2 n) rounds is sufficient to fulfill the 



requirements. 



□ 



3.3 Maintaining Information in the Network 

In a dynamic network with churn limit en, the entire set of nodes in the network can be churned 
out and new nodes churned in within 1/e rounds. How do the new nodes even know what algorithm 
is running? How do they know how far the algorithm has progressed? To address these basic 
questions, the network needs to maintain some global information that is not lost as the nodes in 
the network are churned out. There are two basic pieces of information that need to be maintained 
so that a new node can join in and participate in the execution of the distributed algorithm: 

1. the algorithm that is currently executing, and 

2. the number of rounds that have elapsed in the execution of the algorithm. In other words, a 
global clock has to be maintained. 

We assume that the nodes in V° are all synchronized in their understand of what algorithm to 
execute and the global clock. The nodes in the network continuously flood information on what 
algorithm is running so that when a new node arrives, unless it is shielded by churn, it receives 
this information and can start participating in the algorithm. To maintain the clock value, nodes 
send their current clock value to their immediate neighbors. When a new node receives the clock 
information from a neighbor, it sets its own clock accordingly. Since nodes are not malicious or 
faulty, Lemma [U ensures that information is correctly maintained in more than n — /3c(n) nodes. 



3.4 Support Estimation Under an Oblivious Adversary 

Suppose we have a dynamic network with 1Z nodes colored red in V°. 1Z is also called the support 
of red nodes. We want the nodes in the network to estimate 1Z under an oblivious adversary. We 
assume that the adversary chooses 1Z and which 1Z nodes in V° to color red, but it does not know 
the random choices made by the algorithm. Furthermore, we assume that churn can be linear in re, 
i.e., c(re) = n. 

We now provide our algorithm. P G O(logre) is the number of parallel iterations performed by 
our algorithm in order to increase the precision of our estimate to hold with high probability. Its 
exact value is worked out in the proof of Theorem [TJ At round 1, each red node in V° draws P 
random samples si, S2, ■ ■ ■ , Sj, . . . , sp, each from the exponential random distribution with rate 1. 
Each Si is chosen with a precision that ensures that the smallest possible positive value is at most 
e (1) ; note that O(logn) bits suffice. Each red node u initiates P parallel flooding messages m u (i); 
each m u (i) contains Sj and its terminating condition is: HAS encountered a message m v (i) WITH 
A smaller RANDOM sample. Note that for i ^ j, messages m u (i) do not interact with messages 
m u (j). This allows each live node u to keep track of the P smallest samples that it has seen, which 
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we denote as s u {i) for each i. After some t E O(logn) rounds, each node u E V computes the 
average s u over all s u {i) that it has. Each node u estimates 1Z to be l/s u . It is easy to see that the 
number of bits transmitted per round through a link is at most 0(log 2 n). 

To analyze this algorithm, we use two properties of exponential random variables. Consider 
K ^ 1 independent random variables Y"i, Y2, • • • , Yk, each following the exponential distribution of 
rate A. 

Property 2 (e.g., see [IB]). The minimum among all Yi, 1 $J i ^ K, is an exponentially distributed 
random variable with parameter KX. 

Property 3 (see [28] and pp. 30 and 35 of p]). Let X K = YaLi Y i- Then > for an V * G (0, V 2 ), 

Theorem 1. Consider an oblivious adversary. With high probability, (1 — j3)n nodes in the network 
estimate 1Z 

• between {1 — 5)1Z and {1 + 5)11 for some arbitrarily small 5 > 2/3 when 1Z is large, say 1Z ^ n/2, 
and 

• between 1Z — 5n and 1Z + 5n when 1Z is small, say 1Z < n/2. 

Proof. Suppose 1Z ^ n/2. Out of the 1Z red nodes up to f3n nodes (chosen obliviously) can be 
suppressed leaving us with 

%' ^ 11 - fin ^ (1 - 2/3)11 (4) 

unsuppressed red nodes (since 1Z ^ n/2). In a slight abuse of notation, we use 1Z and 1Z' to denote 
both the cardinality and the set of red nodes and unsuppressed red nodes, respectively. Let 

U = { u : u G P| Influence^, t)}; 

Note that t = O(logn) and \U\ ^ (1 — /3)n (cf. Lemma [3]). Let u be some node in U. Let 

V u = {v : v eK Au e Influence(u, t)}. 

For all u € U, 1Z' C V u C 7?.. Intuitively, at round t, node it is estimating 7?. using the exponential 
random numbers that were drawn by nodes in V u . Since our adversary is oblivious, the choice of V u 
is independent of the choice of the random numbers generated by each v E V u . Therefore, s u (i) is an 
exponentially distributed random number with rate \ V U \ ^ 1Z' (cf. Property [2]) . For any 5 > 2/3, let 
? ^ min{ s ~^j? , j^}- When P = 3c ^ n E O(logn) parallel iterations are performed, where c > 0, the 
required accuracy is obtained with probability 1 — (cf. Property [3]). The case where 1Z < n/2 
can be addressed in like manner. However, we need to allow an error range that is dependent on n 
as up to j3n nodes can be suppressed. □ 

3.5 Support Estimation Under an Adaptive Adversary 

The algorithm for support estimation under an oblivious adversary (cf. Section 13.41 does not work 
under an adaptive adversary. To estimate the support of red nodes in the network, each red node 
draws a random number from the exponential distribution and floods it in an attempt to spread 
the smallest random number. When the adversary is adaptive, the smallest random numbers can 
easily be targeted and suppressed. To mitigate this difficulty, we consider a different algorithm in 
which the number of bits communicated is more. In particular, the number of bits communicated 
per round by each node executing this algorithm is at most polynomial in n. 
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Let TZ be the support of the red nodes. Every node floods its unique identifier along with a bit 
that indicates whether it is a red node or not. At most f3y/n nodes' identifiers can be suppressed by 
the adversary for r2(log 2 n) rounds leaving at least n — (3y/n unsuppressed identifiers (cf. Corollary [Q). 
Each node counts the number of unique red identifiers A and non-red identifiers B that flood over 
it and estimates TZ to be A + n ~^~ B ■ 

This support estimation technique generalizes quite easily to arbitrary churn order. Therefore, 
we state the following theorem more generally. 

Theorem 2. Consider the algorithm mentioned above in which nodes flood their unique identifiers 
indicating whether they are red nodes or not and assume that the network is controlled by an adaptive 
adversary. Let c(n) be the order of the churn; we assume for simplicity that c(n) is either n or ^Jri. 
Then the following holds: 

1. At least n — f3c(n) nodes estimate TZ between 1Z — and TZ + ■ Furthermore, these 
nodes are aware that their estimate is within TZ — and TZ + . 

2. The remaining nodes are aware that their estimate ofTZ might fall outside [JZ— ^ c l n ^ , TZ-\- ] . 
When c(n) = n, it requires only O(logn) rounds, but when c(n) = ^pa, it requires 0(log 2 n) rounds. 

Proof. Let u be any one of the n — f3c{n) nodes that receive at least n—(3c{n) unsuppressed identifiers 
(cf. Lemma [3] and Lemma HJ). Let A and B be the number of unique identifiers from red nodes and 
non-red nodes, respectively, that flood over u. Let C = n — A — B ^ /3c(n). This means that u 
estimates TZ to be A + ^. Clearly, A ^ TZ ^ A + C and since C ^ (3c(n), TZ is estimated between 
TZ — and TZ + l3c ^ . Furthermore, since u received n — /3c(n) identifiers, it can be sure that its 

estimate is between TZ — ^ c ^ 1 ' and TZ + . 

If a node does not receive at least n — f3c(n) identifiers, then it is aware that its estimate of TZ 
might not be within [TZ - ^,TZ + 

From LemmaO when c(n) = n, the algorithm takes 0(log n) rounds to complete because we want 
to ensure that unsuppressed nodes have flooded the network. When c(n) = y/n, as a consequence 
of Lemma HI the algorithm requires 0(log 2 n) rounds. □ 



4 Stable Agreement Under an Oblivious Adversary 

In this section we will first present Algorithm Q] for the simpler problem of reaching Binary Con- 
sensus, where the input values are restricted to {0,1} (cf. Section [2T]) . We will then use this 
algorithm as a subroutine for solving Stable Agreements Section 14.21 

Throughout this section we assume suitable choices of e and a such that the upper bound 

?<-5 (5) 

can be satisfied for f3; note that © must hold in addition to bound ([1]) on pageO Moreover, we 
assume that a node can send an process up to 0(log 2 m) bits in every round, where m is the size of 
the input value domain. 



4.1 Binary Consensus 

A node u that executes Algorithm [T] proceeds in a sequence of O(logn) checkpoints that are in- 
terleaved by O(logn) rounds. Each node u has a bit variable b u that stores its current output 
value. At each checkpoint ti, node u initiates support estimation of the number of nodes currently 
having 1 as output bit by using the algorithm described in Section T3.41 (At checkpoint nodes 
estimate both: the support of 1 and 0.) The outcome of this support estimation will be available in 



13 



checkpoint tj+i where u has derived the estimation #(1). If u believes that the support of 1 is small 
\ n )-> h sets its own output b u to 0; if, on the other hand, #(1) is large |n), u sets its output 
b u to 1. This guarantees stability once agreement has been reached by a large number of nodes. 
When the support of 1 is roughly the same as the support of 0, we need a way to sway the decision 
to one side or the other. This is done by flooding the network whereby the flooding messages are 
weighted by some randomly chosen value. The adversary can only guess which node has the highest 
weight and therefore, with constant probability, the flooding message with this highest weight (i.e., 
smallest random number) will be used to set the output bit by almost all nodes in the network. 

Algorithm 1 Binary Consensus under an oblivious adversary; code executed by node u. 
Let decision^ be initialized to _L. 

Let b u be the current output bit of u. Initially, for each u G V°, b u is set to the input value assigned to u. 

Let ti = 1 be the first checkpoint round. Subsequent checkpoint rounds are given by U = U-i + O(logn), for 

i > 1. Node u decides at round tn, for some R = O(logn), thereby requiring 0(log 2 n) rounds. 

At every checkpoint round ti including tx' 

1: Initiate support estimation (to be completed in checkpoint round tj+i). 

2: Generate a random number r u uniformly from {1, . . . ,n k } for suitably large but constant k. (With high proba- 
bility, we want exactly one node to have generated min„ r u .) 
3: Initiate flooding of {r u ,b u } with terminating condition: ((has encountered another MESSAGE initiated by 

V ^ U WITH r v < T u ) V (CURRENT ROUND > U+l)). 

At every checkpoint round U except tx' 

4: Use the support estimation initiated at checkpoint round ti-x- Let #(1) be it's estimated support value for the 

number of nodes that had an output of 1. 
5: if #(1) < \n then 
6: b u *- 0. 

7: else if #(1) > §n then 
8: b u <- 1. 

9: else if u has received flooded messages initiated in U-i then 
10: Let {r v , b v } be the message with the smallest random number that flooded over u. 
11: b u <— b v . 

At terminating checkpoint round tR-. 
12: if #(1) > f then 
13: decision^ ^— 1. 

14: Flood a 1-decision message ad infinitum. 
15: else if #(0) > f then 
16: decision u ^— 0. 

17: Flood 0-decision message ad infinitum. 

If u receives a fa-decision message: 

18: decisioriu <— fa 



Theorem 3. Assume that the adversary is oblivious and that the churn limit per round is en. 
Algorithm^ solves Binary Consensus in 0(log 2 n) rounds with high probability. 

Proof. We first argue that Validity holds: Suppose that all nodes start with input value 1. The only 
way a node can set its output to is by passing Line [5j This can happen for at most f5n nodes. 
The only way that more nodes can set their output to is if they estimate the support of 1 to be 
in (|n, |n). If /3 is suitably small, Theorem [T] guarantees that with high probability this will not 
happen at any node. The argument is analogous for the case where all nodes start with 0. 

Next we show Almost Everywhere Agreement: Let Ni be the number of nodes at checkpoint 
round ti that output 1. Let LOj, Hij, and MlDj, respectively, be the sets of nodes in V li for 
which #(1) ^ in, #(1) ^ |n, and \n < #(1) < In; note that nodes are placed in Lo,, Hlj, 
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and MiDj based on their #(1) values, which are estimates of iVj_i, not iVj. Clearly, we have that 
LOj + MiDj + Hij = n. 

For some i > 1, let u* G y*'- 1 be the node that generated the smallest random number in 
checkpoint round among all nodes in V 1 ^ 1 . With high probability, u* will be unique. By 
Corollary [21 with probability 1 — j3 (a constant), u* is unsuppressed, implying that b u * will be used 
by all nodes in MiDj. Consider the following cases: 

Case A (Aj_i ^ (4 — 5)n): From Theorem[TJ we know that with high probability |LOj| ^ (1 — f3)n 
implying |MiDj| + |Hij| ^ /3n. Therefore, TVj will continue to be very small leading to small 
estimates #(1) in subsequent checkpoints. After O(logn) rounds, this causes at least (1 — f3)n 
nodes to decide on 0, with high probability. Moreover, it is easy to see that the remaining 
f3n nodes will not be able to pass Line 1121 since the adversary cannot artificially increase the 
estimated support of nodes with 0. (Recall from Section \'6. 41 that by suppressing the minimum 
random variables, the adversary can only make the estimate smaller.) 

Case B {\ - S)n < iVj_i < (\ + 5)n): With high probability, |LO;| + |MlDj| ^ (1 - /3)n implying 
|Hij| ^ (3n. Note first that nodes in LOj will set their output bits to 0. Since N^i < (| + <5)n, 
there are at least (| — e)'n nodes in V l ~ l that output 0. Of these, at most (3n could have been 
suppressed. So, with probability at least | — S — /3, u* is an unsuppressed nodes that outputs 0. 
When u* outputs 0, nodes in MiDj will set their output bits to 0. Thus, considering LOj and 
MiDj, we have at least (1 — f3)n nodes that set their output bits to with constant probability. 
For a suitably small 5 and /3 < \ — S, this will lead to Case A in the next iteration, which 
means that subsequently nodes agree on 0. 

Case C ((i + 5)n ^ 2Vj_i ^ (f - 5)n): With high probability, |MlDj| ^ (1 - j3)n. With constant 
probability (1 — /?), u* will be an unsuppressed node and nodes in MiDj will set their output 
bits to the same value b u *. 

Case D ((| — 5)n < iVj_i < (| + 5)n): This is similar to Case B, i.e., with constant probability, 
at least (1 — (5)n nodes will reach agreement on 1. 

Case E (iVj_i ^ (| + 5)n): This is similar to Case A. With high probability, at least (1 — (3)n nodes 
will decide on 1. 

Note that, when a checkpoint falls either under Case A or Case E, with high probability, it will 
remain in that case. When a checkpoint falls under Case B, Case C, or Case D, with constant 
probability, we get either Case A or Case E in the following checkpoint. Therefore, in O(logra) 
rounds, at least (1 — f3)n nodes will reach agreement with high probability and the all other nodes 
will remain undecided. 

For property Stability, note that if a node has decided on some value in checkpoint t/j, it continues 
to flood its decision message. We showed that, with high probability, at least (1 — f3)n nodes will 
decide on the same bit value. Therefore, it follows by Lemma [T] that agreement will be maintained 
ad infinitum among at least (1 — f3)n nodes. □ 

In order to use Algorithm [T] to solve Stable Agreement, we will need to make a couple of 
crucial adaptations. 

• Suppose every vertex in V° has some auxiliary information. We can easily adapt Algorithm Q] 
so that when a node u decides on a bit value b, then, it also inherits the auxiliary information 
of some v £ V° whose initial bit value was b. This adaptation is possible because our algorithm 
ensures Validity. 
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• For a typical agreement algorithm, we assume that all nodes simultaneously start running the 
algorithm consensus. We want to adapt our algorithm so that only nodes in V° that have 
an initial output bit of 1 initiate the algorithm, while nodes that start with are considered 
passive, i.e., these nodes do not generate messages themselves, but still forward flooding mes- 
sages and start generating messages from the next checkpoint onward as soon as they notice 
that an instance of the algorithm is running. 

We now sketch how the algorithm can be adapted: In the first checkpoint t±, each node v 
with a 1 initiates support estimation and flooding of message {r v ,b v = 1). If the number of 
nodes with 1 is small at checkpoint t±, then, at checkpoint t2, nodes that receive estimate 
values will conclude 0, which will get reinforced in subsequent checkpoints. However, if the 
number of nodes with a 1 at checkpoint t\ is large (in particular, larger than /3n), then, by 
suitable flooding most nodes (in particular, at least (1 — f3)n nodes) will know that a support 
estimation is underway and will participate from checkpoint t2 onward. 

4.2 A 3-phase Algorithm for Stable Agreement 

We will now describe how we use Algorithm [1] as a building block for solving Stable Agreement: 

Flooding Phase: In the very first round, each node u S V° generates a uniform random number 
r u from (0, 1) and if the random number is less than ^p-, it initiates a message m u for flooding. 
The message m u contains the random number r u and the value val u assigned to u by the adversary. 
Nodes enter the candidate selection phase (see below) after a sufficient number of rounds to ensure 
that no more than fin nodes are suppressed (see Corollary [T]) . However, the flooding messages go 
on ad infinitum. 

Candidates Selection Phase: We initiate an expected O(logn) parallel iterations of Binary 
Consensus, each associated with one of the (expected) O(logn) flooding messages m u . More 
precisely, the instance of Binary Consensus for a particular m u is designed as follows: nodes that 
have received the flooded message m u set themselves to 1 and initiate Binary Consensus. If m u 
has reached saturation (i.e., flooded to at least (1 — /3)n nodes), the consensus value will be 1. If m u 
has a very small support (say, /3n), the consensus value will be with high probability (cf. Case A 
of the proof of Theorem [3]). When the support of m u is neither too small nor too large, the nodes 
will reach consensus on either or 1. We say that a flooded message m u is a candidate message 
if the instance of Binary Consensus associated with it reached a consensus value 1. Note that, 
with high probability, (1 — (3)n nodes agree on the set of candidate messages. Among the candidate 
messages, every node v chooses the message m u with the smallest random number r u and value 
val u , and initiates a support estimation for m u . 

Confirmation Phase: On expectation, logn nodes initiate flooding in the Flooding phase. From 
Corollary [21 each of them will not be suppressed with probability at least (1 — (3). Therefore, with 
high probability, at least one node u will have | Influence^, 0(logn))| ^ (1 — /3)n. That is, at least 
one flooded message m v will be a candidate message and therefore, when the support estimation is 
initiated, a set S of at least (1 — (3)n nodes will measure its support to be at least (1 — /3 — 5)n for 
some 5 > 2(3 with high probability (cf. Theorem [T]). Due to ([5]), there can only be one such message 
m v with high support. The nodes S will decide on the attached val„ of m v , whereas nodes that do 
not observe that m v has high support (because of being shielded by churn nodes) remain undecided. 
This shows almost everywhere agreement. 
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Analogously to Algorithm [TJ nodes in S flood their decision messages, which are adopted by 
newly incoming nodes. By virtue of Lemma [U the stability property is maintained ad infinitum. 

The additional running time overhead of the above three phases excluding Algorithm [T] is only 
in O(logn). Thus we have shown the following result: 

Theorem 4. Consider the oblivious adversary and suppose that en nodes can be subject to churn in 
every round. The 3-phase algorithm is correct with high probability and reaches Stable Agreement 
in 0(log 2 n) rounds. 

5 Stable Agreement Under an Adaptive Adversary 

In this section we consider the Stable Agreement problem while dealing with a more powerful 
adaptive adversary. At the beginning of a round r, this adversary observes the entire state of the 
network and previous communication between nodes (including even previous outcomes of random 
choices!), and thus can adapt its choice of G r , to make it much more difficult for nodes to achieve 
agreement. 

It is instructive to consider the algorithms presented in SectionU]in this context. Both approaches 
are doomed to fail in the presence of an adaptive adversary: For the Stable Agreement algorithm, 
the expected number of nodes that initiate flooding in the flooding phase is log n. Even though each 
of these nodes would have expanded its influence set to some constant size by the end of the next 
round, the adaptive adversary can spot and immediately churn out all these nodes before they can 
communicate with anyone else, thus none of these values will gain any support. Simply increasing 
the order of the expected number of flooding nodes to match the churn limit does not help, as this 
will cause considerable amount of congestion and therefore slow down the spreading rate of the 
flooding; this in turn will cause the runtime of the algorithm to exceed 0(logn). 

Algorithm [T] fails for the simple reason that the adversary can selectively suppress the flooding 
of the smallest generated random value z G {1, . . . , n k } with attached bit b z from ever reaching some 
50% of the nodes, which instead might use a distinct minimum value z' (with an attached bit value 
b z > 7^ b z ) to guide their output changes. 

To counter the difficulties we have mentioned, we relax the model. Firstly, we limit the order of 
the churn to ^/n. Secondly, we allow messages of up to 0(n) bits to be sent over a link in a single 
round. Under these relaxations, we can estimate the support of red nodes in the network simply by 
flooding all the unique identifiers of the red and non-red nodes (cf . Theorem [2]) . 

Similarly to Section SI we will first solve Binary Consensus under these assumptions and then 
show how to implement Stable Agreement. In this section we assume that the number of nodes 
in the network is sufficiently large, such that 

n > 4/3 2 . (6) 

Moreover, we assume that every node can send and process up to 0(n + logm) bits per round, 
where m is the size of the input domain. 

5.1 Binary Consensus 

We now describe an algorithm for solving Binary Consensus, which is similar in spirit to 
Algorithm [TJ The main difference is the handling of the case where the support of the nodes that 
output 1 is roughly equal to the support of the nodes with output bit 0. In this case we rely on the 
variance of random choices made by individual nodes to sway the balance of the support towards 
one of the two sides with constant probability. 
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Algorithm 2 Binary Consensus under an adaptive adversary; code executed by node u. 
Let decision^ be initialized to _L. 

Let b u be the current output bit of u. Initially, for each u € V°, b u is set to the input value assigned to u. 
Let ti = 1 be the first checkpoint round. Subsequent checkpoint rounds are given by ti = ti-i + 0(log 2 n), 
i > 1, with time between consecutive checkpoint rounds sufficient for unsuppressed nodes to reach a common 
influence (cf. Lemma 

The algorithm terminates at round tn, for some R — O(logn), thereby requiring 0(log 3 n) rounds. 

At every checkpoint round ti including ti, but excluding tn: 

1: Initiate support estimation (to be completed in checkpoint round ii+i). 

At every checkpoint round ti excluding t\ and tR\ 

2: Use the support estimation initiated at checkpoint round U—i- Let #(1) be the estimated support value for nodes 
that output 1. 

3: if support estimation is not accurate within [1Z — ^^,1Z+ ^-^-] then 

4: Do nothing. 

5: else if #(1) < % - ^S. then 

6: b u <- 0. 

7: else if #(1) > § + ^ then 
8: b u <r- 1. 

9: else 

10: if the outcome of an unbiased coin flip is heads then 

11: b u <r- 0. 

12: else 

13: bu <- 1. 

At terminating checkpoint round tn: 

14: if #(1) ^ f then 
15: decision^ ^— 1. 

16: Flood a 1-decision message ad infinitum. 

17: else if #(0) ^ f then 
18: decision u ^— 0. 

19: Flood a 0-decision message ad infinitum. 

If u receives a b-decision message: 

20: decision u b 



Theorem 5. Algorithm^ solves Binary Consensus within 0(log 3 n) rounds with high probability, 
even in the presence of an adaptive adversary and up to E\fn churn per round. 

Proof. First consider property Validity: Suppose that all nodes start with input value 1. Theorem [2] 
guarantees that any node u that receives insufficient many identifiers for support estimation, will 
execute LineHJand therefore never set its output to 0. On the other hand, if u does receive sufficiently 
many samples, again Theorem [2] ensures that it will always pass the if-check in Line [71 Thus, no 
node can every output 0. The case where all nodes start with can be argued analogously. 

Next, we will show that Algorithm [5] satisfies almost everywhere agreement. Let iVj be the 
number of vertices at checkpoint round ti that output 1. Let LOj, Hij, and MiDj, respectively, be 
the sets of nodes in V u for which #(1) ^ n/2 - #(1) ^ n/2 + and n/2 - ^ < #(1) < 

n/2 + ^y^; note that nodes are placed in LOj, Hij, and MiDj based on their #(1) values, which are 
estimates of iVj_i, not iVj. In a slight abuse of notation, we use LOj, MiDj, and Hij to also refer to 
their respective cardinalities. Clearly, we have that LOj + MiDj + Hij = n. Furthermore, observe 
that either LOj or Hij will be 0. Otherwise, we will have two nodes such that one estimates iVj_i 
below n/2 — while the other estimates it above n/2 + — a violation of Theorem [2l 
Consider the following cases. 
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Case A (iVj_i < n/2 — /3-y/n): From Theorem [21 LOj ^ n — /3-^/n and all nodes in LOj will set 
themselves to output 0. Once this case is reached in some checkpoint, it will be reached in 
all future checkpoints until with high probability. Therefore, the algorithm guarantees 
almost everywhere agreement on in tn; with high probability, nodes do not pass Line 1141 in 
checkpoint £r, thus no node will ever decide on 1. 

Case B (Aj„i > n/2 + f3y/n): This case is similar to Case A with the difference that almost all 
nodes decide on 1. 

Case C (n/2 - fiy/n < 2Vj_i n/2): Notice that Hi; = 0. Therefore, 

LOj + MlDj ^ n - /3y/n. (7) 

We consider two subcases: 

1. In this case, we assume that LOj is at least n/2 + /3-^/n. This will set 2Vj < n/2 — fi^fri 
putting the network in Case A in the next checkpoint. 

2. In this case, we assume that LOj < n/2+/3y / n. This implies that MlDj ^ n— LOj — fi\fn ^ 
n/2 — 2/3 y^. The nodes in MlDj will choose 1 or with equal probability. The number 
of nodes that choose is a binomial distribution with mean ^rp- and standard deviation 

vMlii. Clearly, with some constant probability, ^rp- + ^' D ' or more nodes in the set 
MlDj will set themselves to output 0. Therefore, with constant probability, 



Mid, ^/MID i 
Ni <n - LOj — 



n - LOj - BJn a/ n - LOj - fiy/n 
< n - LOj — — 



Clearly, iVj < § - /3^n if 



3f3y/n < y n — LOj — /3\/n, 
=^ 9/3 2 n < n - LOj - P^/n, 
LOj + j3\/n < n - 9/3 2 n. 

We know that LOj < § + @y/n. Therefore, TVj < § - if 

^ + 2/3^n <n- 9/3 2 n, 
2/3^ < - - 9/3 2 n, 



2/3 < Q - 9/3 2 ) • 



In other words, as long as 



4/3 2 

" > ~ ( g ) 



(i - 9/F 



with constant probability, Ni < § — /3-y/n, which will put the network in Case A at the 
next checkpoint round. The bound © guarantees that Condition ([8]) is easily met. 
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Case D (n/2 < Aj_i n/2 + /3y/n): Using arguments similar to Case C, we can show that with 
constant probability, N{ > \ + fiy/n, thereby, putting the network in Case B. 

Clearly, after O(logn) checkpoint rounds, with high probability, the network will reach either Case 
A or Case and hence achieve almost everywhere agreement on either or 1. 

For property Stability, note that if a node has decided on some value 7^ _L in checkpoint tn, it 
continues to flood its decision message. Since at least (1 — j3)n have decided, it follows by Lemma [J 
that any nodes that have been churned in will also decide on this value within a constant number 
of rounds, thus agreement will be maintained ad infinitum. □ 

5.2 Stable Agreement 

Now that we have a solution for Binary Consensus, we will show how to use it to solve Stable 
Agreement where nodes have input values from some set {0,... , m}, for m ^ 1. Given some 
input value VAL we can write it in the base-2 number system as (bo, . . . , b\ ogm ) where 6, £ {0, 1}, 
for 1 ^ i ^ log to. We call val a general input value and a binary input value. 

The basic idea of the Stable Agreement algorithm is to run an instance of the Binary 
Consensus algorithm for each b{ and then combine the agreed bits to obtain agreement on the 
general input values. More specifically, in the first instance every node uses the bit 60 of its general 
input value as binary input for the Binary Consensus algorithm. We need to be careful, however, 
to not violate the validity property of Stable Agreement. Thus we assume that every node sends 
its general input value along with the input bit. When the Binary Consensus instance of node u 
decides on some bit value b, node u overwrites its general input value with the input value VAL^ that 
was sent along with b. For the next instance of Binary Consensus, u uses the second bit of VAL& 
and so on. After log m such instances, we can be sure that the sequence of binary decision values 
corresponds to the bit value of some general input value, thus guaranteeing validity. Stability and 
Almost Everywhere Agreement follow from the properties of Binary Consensus. 

Theorem 6. Suppose that the network is controlled by an adaptive adversary who can subject up 
to E\fn nodes to churn in every round. There is an algorithm that solves Stable Agreement in 
0(logmlog 3 n). 

6 Impossibility of a Deterministic Solution 

In this section we show that there is no deterministic algorithm to solve Stable Agreement even 
when the churn is restricted to only a constant number of nodes per round. As a consequence, 
randomization is a necessity for solving Stable Agreement. 

We introduce some well known standard notations (see (3J Chap. 5]) used for showing impos- 
sibility results of agreement problems. The configuration C r of the network at round r consists 
of 

• the graph of the network at that point in time, and 

• the local state of each node in the network. 

A specific run p of some Stable Agreement algorithm A is entirely determined by an infinite 
sequence of configurations C°,C , ... where C° contains the initial state of the graph before the 
first round. Consider the input value domain {0, 1}. A configuration C r is 1-valent (resp., 0-valent) 
if all possible runs of A that share the common prefix up to and including C r , lead to an agreement 
value of 1 (resp., 0). Note that this decision value refers to the decision of the large majority of 
nodes; strictly speaking, a small fraction of nodes might remain undecided on _L. A configuration 

3 Due to Equation ^ we know that Cases A and B exist. 
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is univalent if it is either 1-valent or 0-valent. Any configuration that is not univalent is called a 
bivalent configuration. 

Lemma 5. Consider a bivalent configuration C r in round r reached by an algorithm A that solves 
Stable Agreement and ensures Almost Everywhere Agreement. No node in V r can have decided 
on a value ^ _L by round r. 

Proof. Assume in contradiction that some node u has already decided on in some bivalent con- 
figuration C. Then, by the Almost Everywhere Agreement property, no other node v can ever 
decide on 1 in the same run. But this means that C r is actually a univalent configuration, yielding 
a contradiction. □ 

Theorem 7. Suppose that the sequence of graphs (G r ) r ^o is an expander family with degree A. 
Assume that the churn is limited to at most A + l nodes per round. There is no deterministic 
algorithm that solves Stable Agreement if the network is controlled by an adaptive adversary. 

Proof. We use an argument that is similar to the argument used in the proof that / + 1 rounds 
are required for consensus in the presence of / faults (cf. [31 Chap. 5]). For the purpose of this 
impossibility proof, we restrict the input domain of nodes to {0, 1} and allow arbitrary congestion 
on the communication channnels. Moreover, we assume that the topology of the network is fixed 
throughout the run. Thus the adversary can only "replace" nodes at the same position by some 
other nodes. 

For the sake of contradiction, assume that such a deterministic algorithm A exists that solves 
Stable Agreement under the assumed settings. We will prove our theorem by inductively con- 
structing an infinite run p of this algorithm consisting of a sequence of bivalent configurations. By 
virtue of LemmaOthis allows us to conclude that nodes do not reach almost everywhere agreement. 

To establish the basis of our induction, we need to show that there is an initial bivalent con- 
figuration C° at the start of round 1. Assume in contradiction that there is no bivalent starting 
configuration. Clearly, if all nodes start with a value (resp., 1), this network must reach Stable 
Agreement on (resp., 1). This implies that there are two possible starting configurations Cq 
and Ci in which (i) the input values are the same for all but one node u°, but (ii) Cq is 0-valent 
whereas C® is 1-valent. Consider the respective one-round extension of Cq and C® where the ad- 
versary simply churns out node u°. Both successor configurations Cq and C\ are indistinguishible 
for all other nodes, in particular they have no way of knowing what initial value was assigned to 
u°, since all witnesses have been removed by the adversary. Therefore, Cq and C\ must both be 
either 0-valent or 1-valent, a contradiction. This shows that there is an initial bivalent configuration, 
thereby establishing the basis for our induction. 

For the inductive step, we assume that the network is in a bivalent configuration C T ~ l at the 
end of round r — 1. We will extend C r_1 by one round (guided by the adversary) that yields another 
bivalent configuration C r . Assume for the sake of a contradiction that every possible one-round 
extension of C r ~ l yields a univalent configuration. Without loss of generality, assume that the 
one-round extension 7 where no node is churned out is 1-valent and yields configuration C\. Since 
by assumption C -1 was bivalent, there is another one-round extension 7' that yields a 0-valent 
configuration Cq. Moreover, we know that a nonempty set S of size at most A+l nodes must have 
been subject to churn in 7'. (This is the only difference between Cq and C\ — recall that the edges 
of the graph are stable throughout the run.) 

Let S' be a subset of S and let 75/ be the one-round extension of C r_1 that we get when only 
nodes in S' are churned out. Clearly, 7 = 70 and 7' = 75. Consider the lattice of all such one-round 
extension bounded by 7 and 7' that is given by the power set of S. Starting at 7 and moving 
towards 7' along some path, we must reach a one-round extension "fi vl: ___ iVk y that yields a 1-valent 
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configuration D\, whereas the next point on this path is some one-round extension J{ Vl .... :Vk+1 } that 
ends in a 0-valent configuration Dq. The only difference between these two extensions is that node 
Wfc+i is churned out in the latter but not in the former extension. Now consider the one-round 
extensions of Dq and D\ where Vk+i and all its neighbors are churned out, yielding Dq +1 and 
For all other nodes, Dq and D\ are indistinguishible and therefore they must either both be 0-valent 
or both be 1-valent. This, however, is a contradiction. □ 

Considering that expander graphs usually are assumed to have constant degree, Theorem [7] 
implies that even if we limit the churn to a constant, the adaptive adversary can still beat any 
deterministic algorithm. 



7 Conclusion 

We have introduced a novel framework for analyzing highly dynamic distributed systems with churn. 
We believe that our model captures the core characteristics of such systems: a large amount of churn 
per round and a constantly changing network topology. Future work involves extending our model 
to include Byzantine nodes and corrupted communication channels. Furthermore, our work raises 
some key questions: How much churn can we tolerate in an adaptive setting? Are there algorithms 
that tolerate linear (in n) churn in an adaptive setting? We show that we can tolerate 0(y/n) churn 
in an adaptive setting, but it takes a polynomial (in n) number of communication bits per round. 
An intriguing problem is to reduce the number of bits to poly logarithmic in n. 

While the main focus of this paper was achieving agreement among nodes which is one of the 
most important tasks in a distributed system, we believe that the techniques we have developed are 
useful building blocks for tackling other tasks like aggregation or leader election in this setting. 
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